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Introduction: We propose that active Bayesian inference — a general frameworl< for 
decision-nnal<ing — can equally be applied to interpersonal exchanges. Social cognition, 
however, entails special challenges. We address these challenges through a novel 
formulation of a formal model and demonstrate its psychological significance. 

Method: We review relevant literature, especially with regards to interpersonal 
representations, formulate a mathematical model and present a simulation study. 
The model accommodates normative models from utility theory and places them within 
the broader setting of Bayesian inference. Crucially, we endow people's prior beliefs, into 
which utilities are absorbed, with preferences of self and others. The simulation illustrates 
the model's dynamics and furnishes elementary predictions of the theory. 

Results: (1) Because beliefs about self and others inform both the desirability and 
plausibility of outcomes, in this framework interpersonal representations become beliefs 
that have to be actively inferred. This inference, akin to "mentalizing" in the psychological 
literature, is based upon the outcomes of interpersonal exchanges. (2) We show how 
some well-known social-psychological phenomena (e.g., self-serving biases) can be 
explained in terms of active interpersonal inference. (3) Mentalizing naturally entails 
Bayesian updating of how people value social outcomes. Crucially this includes inference 
about one's own qualities and preferences. 

Conclusion: We inaugurate a Bayes optimal framework for modeling intersubject 
variability in mentalizing during interpersonal exchanges. Here, interpersonal 
representations are endowed with explicit functional and affective properties. We 
suggest the active inference framework lends itself to the study of psychiatric conditions 
where mentalizing is distorted. 
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INTRODUCTION 

There is growing interest in modeling behavioral and physio- 
logical responses with biologically grounded normative models, 
particularly in emerging disciplines such as neuroeconomics and 
computational psychiatry. The motivation for these develop- 
ments rests upon characterizing behavioral phenotypes in terms 
of underlying variables that have a principled functional and — in 
some instances — neurobiological interpretation. Recently, opti- 
mal decision making has been formulated as a pure inference 
problem to provide a relatively simple (active inference) frame- 
work for modeling choice behavior and inference about hidden 
states of the world generating outcomes (Friston et al., 2013). 
This is a potentially important development because it provides 
a coherent and parsimonious (Bayes) optimal model of behavior. 
This normative model is consistent with classical treatments, such 
as expected utility theory and softmax response rules, without 
calling on ad hoc parameters like inverse temperature or temporal 
discounting. This means that, in principle, one can characterize 
people's behavior in terms of prior beliefs about the world (as well 
as the confidence or precision of those beliefs). 



In this paper, we demonstrate that this approach can 
also be applied fruitfully when choices — and the underlying 
preferences — are based upon interpersonal beliefs about oneself 
and other people. Social cognition merits special analysis as it 
presents substantial challenges. An active inference framework 
can usefully address some of these, but not without new theo- 
retical considerations. In what follows, we describe the sorts of 
beliefs that may underlie interpersonal exchange and use simula- 
tions of active inference to demonstrate the behaviors that ensue. 
In subsequent work, we hope to use these simulated choices to 
explain observed behavior so as to characterize subjects in terms 
of model parameters that encode interpersonal beliefs. The rou- 
tines used for the simulations of this paper are available as part 
of the academic SPM freeware and can be adapted to a variety of 
games. 

THEORIES OF AFFECTIVELY CHARGED BELIEFS ABOUT SELF AND 
OTHERS 

Self- and other- representations often are heavily affect-laden 
and a vast literature is devoted to them. We cannot do justice 
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to this entire field here and instead focus on four groups of the- 
ories about interpersonal representations. Firstly, "homeostatic" 
theories hold that an adequately positive self-representation is 
so important in itself that healthy humans will even sacrifice 
accurate explanations of social and psychological events to main- 
tain positive self-representations. Classic psychological-defense 
theories (Ogden, 1983; Rycroft, 1995) and attribution theo- 
ries (Bentall, 2003) fall into this group. These theories easily 
explain the biases that healthy people and psychiatric patients 
exhibit in seeing the self in rosy colors (e.g., grandiosity) or 
others in negative colors (e.g., racism) as self-representation- 
boosting manoeuvres. Hence, these theories also explain how 
the motives for one's behavior can be ulterior to the motives 
that the agent believes they are acting under. However, experi- 
mental support for these theories is incomplete (Moutoussis et 
al, 2013). Secondly, economic theories usually consider one's 
true preferences as known to the agent; while at the same time 
their behavior may be directed at instrumentally managing their 
reputation vis-a-vis others, including deceiving them (Camerer, 
2003). Some social-psychological theories combine these utilitar- 
ian perspectives into one construct, social desirability, said to have 
both self-deceit and image-management components (Crowne 
and Marlowe, 1960). Thirdly, there are a group of theories that 
see many adult beliefs about the self and others as products of 
learnt information-processing, relatively divorced from current 
interests. Examples are the rigid "core beliefs" that people often 
hold about themselves according to some cognitive-behavioral 
theories (Waller et al., 2001) or the inaccurate beliefs formed 
when strong affects are said to overwhelm peoples' ability to 
think about their own mind and that of others (Allen et al, 
2008). Finally, there are theories that take into account both 
the fluidity and uncertainty of person-representations (like many 
clinical and psychological theories) and an explicit, current func- 
tional role for them (like the neuroeconomic tradition). This 
is a smaller tradition, exemplified by the "sociometer theory of 
self-esteem" (Leary et al., 1995). Here a particular aspect of self- 
representation — self-esteem — predicts whether other people are 
likely to include or exclude one from social interactions. As access 
to human (e.g., friends, partners), material (e.g., work opportu- 
nities), safety and other resources can be dramatically reduced 
by social exclusion, self-esteem helps predict the success of social 
interactions. When it comes to other-representation, the "sinister 
attribution error" theory of apparently unwarranted suspicious- 
ness (Kramer, 1994) formalizes a somewhat similar logic: that 
taking others to be less well-meaning than they are serves to min- 
imize false-negative errors in the detection of social difficulties. 
However, the theories of Leary et al. and of Kramer are qualitative, 
insufficiently general, and have not been applied to interactive 
exchanges. 

We seek to generalize the "sociometer theory" to encompass all 
self- and other- representations that can be reasonably inferred 
within interpersonal exchanges. In this paper, we provide a spe- 
cific computational example of this. Psychologically it is easy to 
appreciate how making inferences about others helps to make 
predictions: For example, "a fair person will not exploit me." 
Similarly about the self, "honest people like me are trusted." 
However, interpersonal representations may come to serve as 



preferred outcomes themselves; for example, "I'd prefer to be a 
fair person and to deal with fair people." They may summarize 
(and even hide) social, cultural and ultimately evolutionary goals 
that are not otherwise explicitly represented. 

COMPUTATIONAL CHALLENGES OF DECISION-MAKING IN SOCIAL 
EXCHANGES 

One might expect people to maximize the overt benefits that they 
extract from social interactions, such as food or mates, by log- 
ically thinking through different policies and choosing the best. 
However, such a project faces serious challenges, of which we con- 
sider three. These motivate using interpersonal representations to 
make predictions about exchanges and active inference to infer 
both representations and policies. 

The first challenge concerns the potentially explosive complex- 
ity of social cognition. As a key example, interpersonal cognition 
is recursive. In order to achieve maximum material benefit I 
need to predict how another person will react. To do this I need 
to imagine what they will decide. However, they should do the 
same — estimate what I intend to do. Therefore I have to estimate 
what they think that I intend to do. But they wUl do the same and 
so on, without a well-defined end. In contrast, real people in real 
situations only perform a very limited number of such recursive 
steps. We argue that using interpersonal beliefs can increase the 
effective depth of (otherwise costly) cognition. 

The second concerns the arbitrariness of the parameterization 
of many decision-making schemes. As a central example, just 
one parameter is often used to describe the precision (inverse 
noisiness) of choices given the values attached to these choices. 
This precision parameter is then fitted to observed choice behav- 
ior in an agnostic manner. The parameter in question has been 
interpreted in a number of ways that are almost impossible to 
distinguish: sometimes it is cast as intrinsic noise or error rate, 
implying that agents are incapable of more precise or deter- 
ministic choices. Sometimes, it is used to motivate a form of 
exploration, implying that there is something unknown about the 
situation and it is best not to put all one's eggs in one basket. At 
other times, it is seen as a sensitivity that reflects the change in 
behavior for a change in returns. This last interpretation is closely 
related to choice matching, whereby the preferred frequency of 
different outcomes is an increasing function of their utility and 
not a winner-takes-all preference. In learning paradigms it is also 
difficult to separate estimates of precision from the learning rate 
(Daw, 2011). Parameterization of an agents' choices in terms of 
a single noisiness parameter thus conflates error, exploration, 
choice matching and, in practice, learning rates. 

The active inference framework addresses this problem first by 
taking account of the fact that there is always uncertainty about 
outcomes. In a probabilistic sense, optimal outcomes are better 
quantified in terms of probability distributions, as opposed to 
scalar reward or utility functions. We can then separate the opti- 
mal precision over action choice (en route to the outcome), which 
describes how to best get to the desired distribution over outcomes, 
from the preferred outcome distribution itself. The former preci- 
sion can itself be optimized given beliefs about hidden states of 
the world and controlled transitions among them — through for- 
mulating choice behavior in terms of beliefs over policies. The 
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precision in question is the precision of (or confidence in) beliefs 
about alternative policies. It still weighs the choice between dif- 
ferent policies, but it is no longer a free parameter! In contrast 
the precision over outcome preferences is a reward sensitivity, in 
principle testable independently of the task at hand. In the active 
inference framework there is no need for a learning rate param- 
eter as such — the optimal change of beliefs is inferred at each 
step. 

The third computational challenge rests on the difficult calcu- 
lations entailed in using a model of the world to draw inferences. 
Social inferences, for example, present a difficult inverse problem 
when disambiguating the meaning of a particular social datum: 
for example, "my partner gave me nothing" may be important 
both for self-representation ("maybe because I am worthless") 
and for other-representation ("maybe because she is horrible"). 
The framework that we describe is well suited to deal with such 
ambiguities. Their resolution rests upon prior beliefs about social 
outcomes that can be updated on the basis of experience in 
a Bayes optimal fashion. This, like all statistical inversion of 
probabilistic models, is computationally challenging; the active 
inference framework suggests a practical solution based on so- 
called Variational Bayes (a ubiquitous instance of approximate 
Bayesian inference that finesses computational complexity). In 
this paper, we wiU use approximate Bayesian inference to show 
how interpersonal representations are accommodated in terms of 
prior beliefs; thereby providing a normative framework within 
which to parameterize different people and their interpersonal 
beliefs. 

This paper comprises three sections. The first provides a brief 
introduction to active inference, with a special emphasis on how 
preferences and goals can be cast in terms of prior beliefs about 
eventual outcomes. This enables goal-directed behavior to be 
described purely in terms of inference about states of the world 
and subsequent behavior. The second section introduces a Trust 
game to illustrate the formal aspects of modeling interpersonal 
exchanges within this framework. The third (Results) section 
uses simulations of this game under active inference to highlight 
how interpersonal beliefs produce characteristic choice behaviors. 
We conclude with a discussion of putative applications of this 
approach to normative behavioral modeling. 

METHODS 

This section summarizes the building blocks of Active infer- 
ence, which include the following: Adaptive agents are held 
to (i) set themselves desirable goals that they consider likely 
to achieve (ii) choose policies that maximize the likelihood of 
achieving these goals (iii) form beliefs about the world con- 
sistent both with their sensory observations and their goals. 
In this section, we also briefly describe a practical way of 
solving this inference problem, i.e., (iv) using an inference 
process that involves the passing of simple messages between 
cognitive modules. This Variational Bayes (VB) message pass- 
ing or updating is a simpler and more biologically plausible 
method for performing approximate Bayesian inference than 
the schemes usually considered. We then formulate a model of 
a simple interpersonal exchange and describe its implementa- 
tion so that others researchers can use it. The definition and 



meaning of the mathematical symbols we use is summarized in 
Table 1. 

SUMMARY OF ACTIVE INFERENCE 
Setting plausible goals 

In active inference, action elicits outcomes that are the most plau- 
sible under beliefs about how they are caused. This approach 
contrasts with normative formulations in optimal decision the- 
ory, where actions are chosen to maximize the value of outcomes 
rather than plausibility. However, beliefs about outcomes are 
not motivationally neutral — an agent believes that her actions 
will lead to good outcomes. Therefore, if the prior beliefs about 
outcomes — the agent's goals or hopes — reflect the utility of those 
outcomes, then active inference can implement optimal policies, 
effectively seeking out the outcomes with the greatest utility. 

In general, agents may have subtle reasons to distribute their 
prior beliefs over particular outcomes. They may, for example, 
use a matching law such as Herrnstein or softmax mapping to 
preserve ecological resources or to distribute goods among con- 
specifics. We model an agent's preference with a softmax function 
0(r(sr), P) of objective returns r at the outcome time T, so that 
prior (utilitarian) beliefs for any agent or model m, are written as 
follows: 

P(sT\m)=a(r(sT),^) (1) 

This describes a probability distribution over states sj at time T. 
Probability depends upon the return associated with each state. 
This classical utility function is expressed as a map from objec- 
tive ultimate outcomes to prior beliefs, with the relative utility of 
different outcomes depending upon a sensitivity parameter p. 

Choosing policies to achieve the plausible goals 

Suppose that an agent believes that at time t they occupy a state 
Sf. They then need to choose a policy comprising a sequence of 
control states ii = {Uf ■ ■ uj] that leads to the desired outcome 
distribution P(sT\m). If u leads to a distribution over final or 
outcome states P{sT\st, u), then success can be measured by the 
KuUback-Leibler divergence between the anticipated and desired 
distribution. The agent can then choose policies according to this 
measure of their likely success. Following Friston et al. (2013), we 
can express this formally as follows: 

P(u\st, Y, m) = y^-p{-yDKdP{sT\st, M)||P(sT|m)]) (2) 

Here, we have introduced a normalizing constant Z and a confi- 
dence or precision parameter y. While the softmax parameter P in 
Equation 1. calibrates the relative utility of different outcomes, the 
precision parameter y encodes the confidence that desired goals 
can be reached, based on current beliefs about the world and the 
policies available. Unless otherwise stated we will use the unqual- 
ified term "precision" for y. Crucially, precision has to be inferred 
so that the confidence is optimal, in relation to the current state 
(context) and beliefs about the current state and future states. 

Forming beliefs consistent with observations and goals 

In our model, agents need to perform inference about certain 
quantities. An agent's knowledge of how they interact with the 
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Table 1 | Additional definitions and significance of symbols that appear in equations. 



Symbol 


Definition and significance 


Formula where symbol first appears 


P 


Probability mass of a discrete random variable, or probability density 
of a continuous random variable 


P{STm = a{r(ST),P) 


St 


Outcome state — a state that the agent may arrive at time T, the end 
of the exchange 




m 


Model of the world according to the agent. It includes all the rules of 
how the dynamics of the world evolve, as well as the parameters of 
the world that don't change as the world evolves 




a 
P 


IIIVcloC Lcl 1 1 ptJI d LU 1 fcf UVcl UULCUIMfcJo. IL biyillllcb MUW bLIUIiyiy pilUI 

(utilitarian) beliefs change as a function of the outcome measure in 
question (e.g., money) at the point of indifference. 






The Gibbs softmax function. It ascribes to each component of x,- e x 
a probability proportional to exp(/ix/) 




r(x) 


Return associated with state x. 




Ut, u 


Ut is a control state — that is, a state that the agent believes s/he will 
deploy at time t. In general this does not necessarily determine 
what action will be realized at time f — the agent may not have full 
control over this. However in our agents do have such control, so Ut 
equates with the decision about the action to take, u is the 
sequence of control states believed to be taken from now to the 
outcome (e.g.,: "1 will type in all the letters of my password"). 


u = {Ut... Ut] 


y 


Precision of belief about control sequences, it signifies the 
confidence that the goal will be attained, if the best attainable 
combinations of control states are employed. 


P(u\St,y,m) = 

= ^exp(-YDKdP(ST\St, mP(ST\m)]) 


z 


Normalizing constant. In many cases we consider how strong 
beliefs are relative to each other; Dividing each by their sum Z 
ensures they add up to one, as probabilities should. 




D&[PoMI|PiM] 


Kullback-Leibler divergence between a distribution Po(x) and 
another distribution Pi (x). It is the expectation with respect to 
Po(x) of the difference in surprise inherent in encountering each 
possible value of x according to the two distributions. 




Pr 


Probability value; Pr(x = a) is the probability that x takes the value a. 


P(6, s, u, y\m) = Pr(joo, . . . , Ot] = b. 
{So, ...,st] = ~s, {Ut,... . ut] = ii, y) 


P(6, s, u. Y\m) 


Probability density according to the generative model m; i.e., the 
world including the agent 


P(6, s, u, y|m) = 

= P(6|s, m)P(u\St, y, in)P(s\m)P{y\in) 


Q(s, u, vlii) 


Q is the belief that agent infers using the approximate inference 
scheme. Rather than being expressed in terms of probability 
distributions, it is expressed in terms of their "sufficient statistics" 

such as its expectation. Not to be confused with Q, the matrix 
representation of policy values. 


Q(s, u, yIil) Pr((so, . . . , St) = s, {Ut, ...,UT] = u,y) 


= (So, St, u, Y) 


The specific instantiation of the sufficient statistics in our example. 


H, = (So, St. u, y) 


H[P(x)] 


H is the Entropy of the distribution P(x). It is a measure of the 
average surprise of this distribution 


-DKdP(ST\St,U)\\P(ST\m)] = 

= HLP(ST\st, u)] + Ep(s^|s,,o,[ln P(sr|m)] 


Ep(;<) [In P(x)] 


Ep(x)[f(x)] signifies the Expectation of f(x) under the probability 
distribution P(x). In active inference In P(x) is a measure of utility. 





Frontiers in Human Neuroscience 



www.frontiersin.org 



March 2014 | Volume 8 | Article 160 | 4 



Moutoussis et al. 



A formal model of interpersonal inference 



world can be expressed as a joint distribution over these requisite 
quantities: 

P{d, s, u, y\m) = Pr({oo, . • . , Ot} = o, 

{so, . . . , Sf} = s, {ut, . . . , uj} = M, y) (3) 

This probabilistic knowledge constitutes a generative model over 
observations, states, control and precision. This model is consti- 
tuted by prior beliefs about policies P(u\st, y, m) — as specified 
by Equation 2 — state transitions, the likelihood of a sequence of 
observations stemming from those states and prior beliefs about 
precision: 

P(o, 5, u, y\m) = P(o\~s, m)PCs\u, m)P(u\st, y, m)P(y\m) (4) 

Agents can use this model to infer the hidden states of the world 
s = {so • • ■ Sf}; to determine where each policy, or sequence of 
choices, u = {Uf ■ ■ uj), is likely to lead; and to select the pre- 
cision Y that encodes the confidence in policy selection. Agents 
can infer hidden states, their policy and precision from observed 
outcomes by inverting the model above. To do this they have 
two assets at their disposal: their observations o = {o\ - ■ ■ Ot] and 
their model m of choice-dependent probabilistic state transitions. 

To keep things simple, we assume a one-to-one mapping 
between observations and states of the world. This is encoded 
by an identity matrix A with columns corresponding to states, 
rows corresponding to observations and elements encoding the 
likelihood of observations — P{d\s, tn), under their model. 

State transitions in an interpersonal world 

We can describe the possible states of the world as a cross prod- 
uct between a subspace which is hidden and one which can be 
observed. An example of the former is "my partner is cooperative" 
whereas an example from the latter is "they will give me noth- 
ing." We model transitions between hidden states as constrained 
by the meaning of these subspaces. The part of the world-state 
that describes my partner's traits cannot change (otherwise they 
would not be traits). The part which describes their actions will 
be a probabilistic function of what I will do. As an example, the 
action "they will give me nothing" is probable if I follow a policy 
of giving them nothing myself 

Agents therefore describe changes in the world contingent 
upon what they do in terms of a 3-D transition matrbc. This 
matrix B(ut) has one "page" for each control state Ut that the 
agent can employ. Each page has columns of possible states at time 
t; and rows of the possible states at time t+1. The entries of B are 
the probabilities P(st+i\st, u). As the reader may have noticed, the 
policy-dependent probabilities in Equation 2 can be derived by 
the repeated application of B. 

A practical method for performing inference 

If agents have at their disposal a function _F that approximates how 
inconsistent their beliefs and observations were, they can mini- 
mize -F to maximize the chance of achieving their goals. A suitable 
function _F is the free energy of observations and beliefs under a 
model of the world. The reader is referred to Friston et al. (2013) 
for a full explication of free energy in active inference. For our 



purposes, we just need to know that _F provides a measure of the 
probability of the observations under the model F ^ —\nP(o\m). 
This means that minimizing free energy renders observations the 
least surprising, under my model: "Given that I am likely to be 
at work in an hour (belief under model of the world) it is not 
surprising that I'm in a train station (observation); it would he 
surprising if I headed for the cinema (belief about behavior)." The 
free energy defined by a generative model is thus an objective 
function with respect to optimal behavior — where optimality is 
defined by the agent's beliefs. 

Posterior beliefs correspond to an approximate posterior prob- 
ability over states, policies and precision. These beliefs are param- 
eterized by sufficient statisticsjjL e R"^ such that Q(s, u, yIm-) ~ 
Pr({so, . . . , Sf} = s, {ut, . . . , uj] = u, y). The free energy then 
becomes a function of the sufficient statistics of the approxi- 
mate posterior distribution. This allows us to express approximate 
Bayesian inference in terms of free energy minimization: 

jif = argmin^_F(o, |i) (5) 

where actions or choices are sampled from Pr (at = Ut) = 
Q(ut\[it)- This means policies are selected that lead to the least 
surprising actions and outcomes. In summary, the optimization 
of sufficient statistics (usually expectations) rests upon a gener- 
ative model and therefore depends on prior beliefs. It is these 
beHefs that specify what is surprising and consequently optimal 
behavior in both a Bayesian and utilitarian (optimal decision 
theory) sense. 

A common scheme used to perform free-energy minimization 
is VB. Many statistical procedures used in everyday data analysis 
can be derived as special cases of VB. We will not go into technical 
details and interested readers can find a treatment of VB relevant 
to the present discussion in Friston et al. (2013). Here, we note 
that VB allows us to partition the sufficient statistics into three 
common-sense subsets: statistics describing beliefs about states 
of the world causing observations; statistics describing beliefs 
about the (future) policy u = {Uf... uj} to be selected; and statis- 
tics describing beliefs about precision y- M- = (^o, ^t, Y)- 
These statistics are updated with each new observation, using 
variational message passing (VMP). Each belief (about precision, 
about the state of the world etc.) is a probability distribution 
held in a "node" of a network of such beliefs, as in Figure 1 . 
Each belief not only has a most-likely-value but also an uncer- 
tainty, and possibly other features, that describe the exact shape 
of the distribution. In our case, these features are encoded by the 
statistics above. In VMP, the belief distributions and their associ- 
ated parameters (sufficient statistics) are chosen from amongst a 
rich and flexible — but not unlimited — vocabulary, the so-called 
conjugate-exponential belief networks. When one of the beliefs — 
say, the sensory state — is updated via an observation, it is no 
longer consistent with the others: the free energy increases. The 
"node" of the network representing this belief then sends infor- 
mation about its new content (e.g., the expectation or mean of 
the distribution) to all the other belief "nodes" with which it is 
connected. It also sends information about the beliefs on which it 
depends to nodes sending messages, which mandates a reciprocal 
or recurrent message passing. The recipient "nodes" then adjust 
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Variational updates Functional anatomy 




FIGURE 1 I This figure illustrates the cognitive and functional anatomy 
implied by the mean field assumption used in Variational Bayes. Here, we 
have associated the variational updates of expected states with perception, of 
future control states (policies) within action selection and, finally, expected 
precision with evaluation. The updates suggest the sufficient statistics from 
each subset are passed among each other until convergence to an internally 
consistent (Bayes optimal) solution. In terms of neuronal implementation, this 



might be likened to the exchange of neuronal signals via extrinsic connections 
among functionally specialized brain systems. In this (purely iconic) schematic, 
we have associated perception (inference about the current state of the world) 
with the prefrontal cortex, while assigning action selection to the basal ganglia. 
Crucially, precision has been associated with dopaminergic projections from 
ventral tegmental area and substantia nigra. See Friston et al. (2013), whence 
this figure has been adapted, for a full description of the equations. 



their parameters, and thus change the behefs they encode, so as 
to increase consistency with the source of the message. Of course, 
this may put them a little out of line with yet other beliefs. Hence 
messages propagate back and forth via all the connections in the 
network, changing the statistical parameters that the nodes hold, 
until free energy cannot be reduced any further and consistency 
is once again optimized (Winn and Bishop, 2005). 

The simplicity and generality of this VMP scheme speaks to the 
biological plausibility of its neuronal implementation (Friston et 
al, 2013). A common objection to Bayesian schemes is that it is 
implausible that the brain performs long algebraic derivations, or 
alternatively high-dimensional numerical integration, every time 
a new task was at hand. However, evolution may have converged 
on the simplicity and efficiency of VMP — or at least something 
like it. 

Figure 1 shows the architecture of variational updates for any 
generative model of choice outcomes and hidden states that can 
be formulated as a Markov decision process. The functional 
anatomy implied by the update equations is shown (schemati- 
cally) on the right. Here the distributions over observations given 
hidden states are categorical and parameterized by the matrix A as 
above. Similarly, the transition matrices B{ut) encode transition 
probabilities from one state to the next, under the current control 
state of a policy (ii = {ut ■ ■ ■ mj})- 

In the simulations that follow we used a prior over precision 
that has a gamma distribution with shape and scale parame- 
ters a = 8 and 9=1. The matrix Q contains the values of the 
i-th policy from the j-th current state and corresponds to the 
divergence term in Equation 2. We see that expectations about 



hidden states of the world are updated on the basis of sensory 
evidence, beliefs about state transitions and value expected under 
allowable policies. Conversely, policies are selected on the basis of 
the expected value over hidden states, while precision is mono- 
tonicaUy related to value expected over hidden states and policies. 
See Friston et al. (2013) for details. 

INFERENCES ABOUT PEOPLE IN A MODEL TASK 
The simplified trust game 

To illustrate the basic features of this formulation we construct 
a model^ of a simplified Trust Task based on the multi-round 
Investor- Trustee game (King-Casas et al, 2008). I (self) am to 
play consecutive rounds with the Trustee (other). At each round 
1 1 earn a wage w^^'^, usually set at 20 units of play money. I can 
then invest one of a discrete set of fractions /*'^'-^' P'^'^' ^'^^ 
of my wage in a joint venture with the other. The investment is 
multiplied by a gain g, representing the surplus value created by 
the joint venture (usually g = 3). The other then returns a frac- 
tion of the invested amount. The round ends with the following 
returns: 

^s'''/ _ ^self _ ^selfj-self _|_ ^ self jrsdf j-other 
yOther _ ^self jrself g ^self jrself j'other 



'Here we present the model step-by-step; see also the Discussion section 
regarding the rationale behind specific modeling choices. 
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Our Trust-Task is simpler than the standard Investor-Trustee 
game, with respect to the levels of investment and repay- 
ment available to the players. We make available only two 
levels, thus rendering a matrix representation of the exchange 
more straightforward and allowing experimenters to enforce 
(psychologically) interesting choices. The available response 
fractions / correspond only to Cooperation (action 1) or 
Defection (action 2). A matrix of monetary returns for self and 
other that can be used for this simplified task is shown in 
Table 2. 

The task is a multi-round game — partners have to make 
decisions, taking into account long-term consequences of their 
choices. This is a difficult problem — and we will see that appro- 
priate use of interpersonal representations may be used as a 
shortcut. 

Interpersonal representations and prosocial utilities 

We now consider the issue of how preferences are constituted 
in the generative model. To construct our minimal model, we 
assume the following: 

1 . self and other are each represented by a single scalar quantity — 
"how good one is." We will call this "esteem" so that is how 
good the self is, while the esteem of the other is e". 

2. A "good" person, with positive esteem, is more likely to 
cooperate with an average person, all other things being equal. 

3. An average person is more likely to cooperate with a "good" 
person, other things being equal. 

The observable component of world states is disclosed by action 
(u°, M*) and the hidden component (e^ e°) concerns the traits 
to be inferred. The fact that a "good" person is more likely to 
cooperate — and to attract cooperation — highlights the fact that 
esteem can augment the utility of cooperation. An analogous 
reasoning applies to defection. 

Preferential biases induced by esteem can be specified in terms 
of an augmented return that includes the payoff and esteem. 
Following the format of Table 2 we write: 



Table 2 | Trust Task monetary returns matrix with only two choices for 
each partner. 

Other 



self 
= 1 

(Cooperate: pelf, high) 
= 2 

(Defect: pe/f./ow) 



u° = 1 

(Cooperate: f°"'er. high^^ 
rfi (e.g., =26) 



(e.g., 
(e.g., 



(e.g.. 



=26) 
=21) 
=7) 



u° = 2 

(Defect: f°""''- '°") 
r|i (e.g., =10) 
(e.g., =42) 
/■|2 (e.g., =18) 
r° (e.g., =10) 



These returns are defined by payoffs r^, > r^. 



for thie self (in the 



is constructed as a sequential game, with my self playing first and is typically 
asymmetric. In the example in bracl<ets I have a "wage" of 20 units. I can choose 
to invest P""' = 20% or P""'- ^'sh = 80%: the other can choose to return 40 
or 140% of my investment. All amounts have been rounded. 



r'iu" 
r^iu" 



1, u' 



1,6^,6°) 



+ e' + e° 
P°r^2 + + 



(7) 



Table 3 gives the augmented returns for each combination of 
outcomes. 

With this setup observable outcomes can take just 5 values: A 
"starting state" and four outcomes: 02 = {m^ = l,u° = 1}, 03 = 
{u^ = 2, u° = 1} and so on, for all combinations of cooperation 
and defection. For each round, each player has to model the 
transition probabilities P(sj\st, u). If r''{u°, u\, e*, e°) denotes the 
augmented return for the other, self can use a softmax function to 
calculate the probabilities of actions taken by the other (following 
Equation 1): 



P(u°\u\, e", e°) 



exp(r°) 



(8) 



However, this requires that self knows the beliefs of other about 
hidden esteems (e*, e°). We will assume that self uses beliefs about 
their esteem to model the beliefs of the other. We will see later 
that this is not an unreasonable assumption. Furthermore, we 
assumed that players can resolve just two levels of esteem e° = p 
(for prosocial) or e" = m (for non-social or antisocial). To fur- 
ther simplify things, we assume that the self esteem is neutral, 
e° = 0. Prior beliefs about choices will then be influenced by "who 
I would like you to be" and "what I would like (us) to get." These 
simplifications create a discrete hidden state space with 10 states. 
These correspond to the five observable states, for each of the 
two levels of the other's esteem e° e {p, n]. The action chosen by 
self were sampled from posterior beliefs over choices based on 
the prior beliefs over policies of Equation 2. These prior beliefs 
depended on the utilities in Table 3. 

IMPLEMENTATION OF THE TRUST GAME IN ITERATED PLAY 

We implemented the multi-round version of the Trust game by 
using the posterior beliefs about the partner, at the end of each 
round, as the priors for the next round. 

The software routines were written using the SPM aca- 
demic freeware platform in madab (MATLAB, 2012). The SPM 
platform, including the DEM toolbox used here, is available 



Table 3 | Utility matrix for the simplified Trust task. 

Other 



self 
u= = 1 
(Cooperate) 
u= = 2 
(Defect) 



u° = 1 (Cooperate) 



u° = 2 (Defect) 



Sffii + e 



-I- e' 



|0 



Pr'21 ■ 
fr'22 ' 



The entries of Table 1 are weighted by a sensitivity parameter and then 
augmented by an interpersonal component to form socialized returns. The inter- 
personal component consists of the esteem for each partner plus the esteem for 
the other partner (weighted equally in this example). Positive esteems enhance 
cooperative utility whereas negative esteem increases the utility of defecting. 
We have assumed here that = p° = Pr . 
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under GPL (GNU General Public License, version 3, 2007). It 
can be accessed via www.fil.ion.ucl.ac.uk/spm/software/spml2. 
Additional scripts are available fi'om the corresponding author on 
demand, also under GPL. 

RESULTS 

In order to perform simulations we used the monetary values 
in Table 1. To calculate the numerical values corresponding to 
Table 2, we chose a value for Pr such that the resulting prob- 
abilities according to Equation 8 would be very distributed. 
Furthermore, for the purposes of this demonstration, we chose 
the other to be antisocial; i.e., have a negative esteem, and naive; 
i.e., only influenced by immediate outcomes (as per Equation 8). 
The preferences (priors) that these choices translate into for the 
other are shown in Figure 2A. The other would prefer the self to 
cooperate and the other herself to defect (cd in Figure 2A). Their 
second best preference would be mutual cooperation (cc), which 
still has a substantial monetary outcome. The other is indifferent 
about the remaining two options, in which the self defects (dc, 
dd). In Figure 2, we have included the starting state (start) as a 
potential outcome — as is required by the model specification in 
the code we used. We set the starting state probability to zero, as it 
never actually materializes as an outcome and agents do not need 
to consider a preference for it. 

The situation is a little more complicated, and more interest- 
ing, with respect to the goals of the self that this scheme gives rise 
to. These are shown in Figure 2B. Whereas our antisocial, naive 




FIGURE 2 I Pattern of social utilities In P(ST\m) = "(r^isj), P). (A) 

Preferences of the other. This simple other only considers observable 
states of each round — the starting state (start), and each of the four 
self-action — other-action combinations shown in Table 3. The "start" state 
is only indicated for completeness: agents correctly never consider it as an 
outcome. (B) Preferences (goals) of the self. Preferences over all 10 hidden 
states are shown; See text for detailed description. 



other did not consider separate states for prosocial vs. antisocial 
self, we endowed the self with preferences depending on the type 
of the other and hence we consider the full 10-state outcome space 
for each round of the exchange. 

Figure 2B shows that the preference of the self for mutual 
cooperation is more pronounced if the other is prosocial. As one 
might expect, given an antisocial other the second-best preference 
for self is for the other to cooperate while self defects. More inter- 
estingly, given a prosocial other the second-best preference for the 
self is to cooperate, while the prosocial other defects. Heuristically, 
self is forgiving toward prosocial but not antisocial others. 

A SINGLE-ROUND 

The basic behavior of self when choosing a policy through free 
energy minimization is shown in Figure 3. Initially, self believes 
that the other is equally likely to be p or n. In other words, at the 
beginning of a series of exchanges, we assume people are agnos- 
tic as to the character or esteem of their opponent. Notice that 
although there are 10 hidden states, there are only five observable 
states — because the esteem (of the other) is hidden and has to be 
inferred. 

At the first time step self just observes the starting state and 
believes the other is equally likely to be prosocial or antisocial, 
corresponding to hidden states 1 or 6. Still, under the influence 
of their utilitarian priors self assigns a higher probability to the 
cooperative policy (control state 1). With the parameters used 
in this example, this is a modest preference: as it happens, the 



Observed states 



B Inferred states 




2 

Time 



RjII 
priors 



Inferred policy 



True states 




FIGURE 3 I Inferences made by self during a single round, where self 
initially believes that the other is just as likely to be prosocial as 
antisocial. The numbering of states from 1 to 10 corresponds to the 10 
states in Figure 2B. (A) This shows that the observable state changed from 
state 1, the starting state, to 5, corresponding to mutual defection during 
this example round. (B) Initially the belief of self was equally shared 
between playing a prosocial partner or an antisocial partner (state 1 or 6). At 
the end of the round, belief was shared between mutual defection with a 
prosocial (s5) or antisocial (slO) partner, but no longer equally so. Defection 
made the self infer that the other was more likely to be antisocial: P(slO) > 
P(s5). The colunnn "Full priors" corresponds to Figure 2B. (C) Control state 
1 (cooperation) is slightly favored despite agnosticism, at this stage, as to 
the type of the other As it happened however the self still chose to defect, 
as choice is probabilistic (D). The underlying true states: in this example the 
other is antisocial. 
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choice selected was to defect — to which the other responded by 
also defecting. Self therefore observes outcome state 5. Finally, on 
the basis of this outcome, self infers that they are more likely to 
be playing an antisocial other, which is the case. Clearly, in a single 
round, self cannot make use of this inference. However, if we now 
replace the prior beliefs about the other with the posterior beliefs 
and play a further round, we can emulate Bayesian updating of 
beliefs about the other. We now turn to the simulation of iterated 
play using this method of updating beliefs. 

ITERATED PLAY 

During iterated play, beliefs about the other evolve. This has a 
knock-on effect on the goals or priors for each round — that pro- 
duce a progressive change in preferred policies as one learns about 
the other and adjusts one's behavior accordingly. The result of 
a multi-round game is shown in Figure 4 and reveals several 
interesting features: 

The agent infers fairly quickly that the other is antisocial and 
reduces cooperative play. In this example, they still engage a 
considerable amount of cooperative play - outcome state 4 in 
Figure 4C is self-cooperate, other-defect 04 = {m* = 1, m" = 2}. 
These outcomes reflect the preference of self, not a lack of confi- 
dence or expected precision. The evolution of expected precision 
is interesting. Precision reflects whether the available policies can 
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FIGURE 4 I (A) A sequence of 32 rounds of the simplified Trust task. Over 
the course of approximately 10 rounds, self becomes confident that other 
is antisocial. (B) This increasing belief results in a declining belief in 
(preference for) cooperating. (C) In this example the actions chosen are 
quite variable and: (D) expected precision changes relatively slowly. The 
variability of responses is due to the relatively weak preferences over 
different outcomes used here; this is to illustrate how one quantity (e.g., 
expected precision) changes with respect to another (e.g., players' choices) 
over a single round or over a sequence of rounds. 



fulfill the goals or utilitarian priors. Initially, there was prior belief 
that fully cooperative play might be achieved, given the other 
might be prosocial. When it looked as if this was the case (out- 
come state 2 in Figure 4C), precision jumped optimistically (4D). 
However, overall, there is a slower increase in expected precision, 
as the agent realizes the true nature of the opponent (i.e., that the 
other is antisocial). This Ulustrative example highlights the impor- 
tant interplay between prior beliefs about outcomes, inference 
on hidden states or characteristics of opponents and, crucially, 
confidence in the ensuing beliefs. 

DISCUSSION 

In this paper, we applied active inference to interpersonal decision 
making. Using a simple example, we captured key aspects of single 
and repeated exchanges. This example belongs to the large fam- 
ily of partially observable Markov decision problems (POMDP) 
but its solution is distinguished by explicit consideration of the 
agent's goals as prior distributions over outcomes. Because behav- 
ior depends upon beliefs, this necessarily entails beliefs that have 
precision. In other words, it is not sufficient simply to consider 
the goals of interpersonal exchange, one also has two consider the 
confidence that those goals can be attained. We have focused on 
optimizing this precision of beliefs about different policies — as 
opposed to sensitivity to different outcomes. In what follows, we 
consider the difference between sensitivity and precision. We then 
consider the nature of interpersonal inference and how it shapes 
decision-making. Finally, we discuss further developments along 
these lines. 

SENSITIVITY OVER OUTCOMES vs. PRECISION OVER POLICY CHOICE 

One of the key consequences of our formulation is the sep- 
aration of choice behavior into two components. The first is 
inherent in the prior distribution itself, which reflects goals that 
are not directly represented in the exchange — as might be codi- 
fied by various matching rules or exploratory drives. The second 
is optimized by the agent during the exchange itself in order 
to maximize utility or returns, in light of what is realistic. As 
described in Friston et al. (2013), this decomposition can be 
seen clearly by expressing the negative divergence — that consti- 
tutes prior beliefs — in terms of entropy (promoting exploration 
of allowable states) and expected utility: 



DKdP(sT\st,u)msT\m)] 



H[PisT\st, m 

+ £p(5H5„«)[lni'(srl'«)] (9) 



Therefore minimizing the difference between attainable and 
desired outcomes can always be expressed in terms of maximiz- 
ing expected utility, under the constraint that the entropy or 
dispersion of the final outcomes is as high as possible. 

This separation of choice behavior — into (context-sensitive) 
beliefs about policies vs. (context invariant) beliefs about which 
outcomes are desirable — is reflected by an introduction of preci- 
sion Y to complement the softmax sensitivity p. Both parameters 
play the role of precision or sensitivity (inverse temperature), p 
determines how sensitive prior beliefs are to rewards or the rela- 
tive utility of different outcomes. However, this does not specify 
the confidence or precision that these outcomes can be attained. 
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This is where the precision parameter y comes in — it encodes the 
confidence that desired outcomes can be reached, based on cur- 
rent beliefs about the world and allowable policies. For example, 
one can be very uncertain about the contingencies that intervene 
between the current state and final outcome, even if one is con- 
fident that a particular outcome has much greater utOity than 
another. 

Crucially, the precision of the probability distribution over 
alternative policies can itself be inferred in a Bayes-optimal sense. 
This represents a departure from classical formulations. It arises 
because we are formulating policy selection in terms of infer- 
ence. Choices are based upon beliefs (or inference) and beliefs in 
turn are held with greater or lesser confidence. The Bayes-optimal 
selection of precision over policies is a key thing that the cur- 
rent formulation brings to the table, above and beyond classical 
formulations. 

INTERPERSONAL REPRESENTATIONS AS MOTIVATING BELIEFS 

Our modeling demonstrates that the formulation of interpersonal 
representations in terms of plausible and desirable outcomes 
accommodates a number of psychological findings and points to 
interesting theoretical and empirical questions. 

First, our model replicates basic features of other successful 
models of interactive games. The 'esteem' traits in our model 
parallel the role of fairness-related coefficients in other mod- 
els (Xiang et al., 2012). Second, our model infers the type of 
the partner (e.g.. Figure 4A) and adjusts its policy so that it is 
not exploited (Figure 4B). Thirdly, posterior beliefs are based 
upon a generative model that entails beliefs about beliefs (utility 
functions) of others. This endows the generative model with an 
elemental theory of mind. Furthermore, Bayesian inference about 
esteem, and therefore intentions, constitute an elementary form 
of mentalizing (Allen et al, 2008). 

In our case the fact that interpersonal representations con- 
tribute to the agent's beliefs about the desirability of out- 
comes biases inference about states perceived and actions selected. 
The perceptual update in Figure 1 contains a contribution 
from precision. This is a remarkable effect of approximate 
Bayesian inference. In our example (Figure 4B) the result is 
that the agent is biased toward co-operativity, despite believ- 
ing that the other is as likely to be uncooperative as not 
(Figure 4A). This is an interpersonal analog of optimism bias, 
or 'giving the benefit of the doubt'. There is experimental evi- 
dence in the Trust task that beliefs about prosocial traits in 
the other result in preference structures akin to the proso- 
cial side of Figure 2A. When Investors are made to believe 
that the Trustee is of 'moral character' they entrust larger 
amounts (in our terms, cooperate in a sustained manner) even 
if the experimenter manipulates Trustee behavior so that the 
Investor does not make more money as a result (Delgado et al., 
2005). 

Our treatment suggests that interpersonal representations can 
help predict (and seek out) the outcomes of interactions. The idea 
that a self-esteem aspect of self- representation helps predict social 
outcomes is a central empirical finding of research by Leary and 
co-workers (Leary et al., 1995). Aspects of other-representation 
that help predict active social outcomes can be found in negative 



ideas about others, that healthy people harbor in certain contexts. 
As mentioned, exaggerated suspicion about others can serve to 
manage false-negative errors in the detection of social difficul- 
ties (Kramer, 1994). Computationally, more sophisticated agents 
can predict interactions better. Under certain constraints, how- 
ever, interpersonal beliefs in the form of prosocial biases help 
achieve behavior that emulates such sophisticated thinking, a key 
theoretical finding of Yoshida and co-workers (Yoshida et al., 
2008). 

Interpersonal inference suggests that the use of self- 
representations to predict outcomes requires an assessment of 
context. In our Trust task, my partner and I can just consider 
one round in the future, provided we have inferred our types 
appropriately and, implicitly, the effective nature of the exchange 
(cooperative or competitive, etc.). Our simulation contains an 
interesting example of what happens if the wrong representations 
are assumed. The game is cooperative but, in our example, the 
other is antisocial (and unsophisticated). The others preference, 
stemming from their negative "niceness" (esteem), is to defect 
while the self cooperates, followed by mutual cooperation. Note 
that this preference structure is the only element in our naive 
others cognitive machinery. When the self infers this preference 
structure they switch to a more uncooperative policy, thus under- 
mining the other s goals. Had the other been "nice" enough, or 
had they believed the self to be "nice" enough, the self would 
have inferred this and the others predictions, or goals, would be 
fulfilled. 

We see that goals are not prescribed by immediate reward but 
by more generic beliefs. Clearly, there are an enormous number 
of forms for these beliefs that we could consider that help pre- 
dict and realize different outcomes in different contexts. In the 
present context, one might consider the long-term payoffs that 
accrue from a collaborative policy for the agent or for everybody. 
Crucially, collaboration entails a consilience in terms of proso- 
cial preferences or utility. The key thing about prosocial utility is 
that it can be symmetrical with respect to me and my opponent. 
For example, I may altruistically value the total reward accrued by 
myself and my opponent if I think they are prosocial, but only my 
own rewards if they are antisocial. In our simple illustration, and 
with the right choice of parameters, this would result in a very 
similar pattern of exchange to that seen in Figure 3. Alternatively, 
through some aversion to inequality, self might prefer equitable 
outcomes (irrespective of who gets most). 

In our simulations the effect of esteem operates like a social 
Pavlovian bias, biasing beliefs irrespective of their consequences. 
A Pavlovian bias enhances certain actions in certain contexts. For 
example, it enhances passivity in a context of threat or vigorous 
approach in a context of opportunity, irrespective of instrumental 
outcomes. Our social Pavlovian bias promotes certain actions in 
the context of certain personal esteems irrespective of instrumen- 
tal outcomes. Here, we chose a scheme of social Pavlovian biases 
that makes direct links between contemporary research into these 
fundamental biases (Guitart-Masip et al., 2012) and the large 
body of clinical- and social- psychological work on affectively 
charged representations of people. This work spans Aristotelian 
ethics, forensic psychotherapy (Gilligan, 2000) through to attri- 
bution theory (Thewissen et al., 2011). 
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We placed emphasis on prior beliefs as they may absorb var- 
ious beliefs about long-term outcomes. These utilitarian beliefs 
entail the agents' cognitive-affective horizon, beyond which the 
agent has no knowledge and no control. This contrasts with the 
dynamics of the exchange, wherein the agent has both beliefs 
about states and beliefs about control. We envisage that the 
present approach will help disentangle these two components in 
the setting of interpersonal dynamics. 

Although our ultimate aim is to study how self-representation 
is inferred under active inference, in this introductory study 
we have kept self-representation constant. Although we hope 
to examine this in future work here we note that a Bayesian 
framework naturally predicts that ordinary self-representation 
should be less responsive to evidence than the representation 
of others. Setting aside beliefs about changeability of the self, 
as well as the real possibility that aspects of self-representation 
may be learnt "once and for all" during childhood, inference 
about self-representations must take place on the basis of a much 
greater evidence base than inference about strangers. Therefore 
each new piece of evidence is expected to have less impact on 
self-representation than other-representation. 

MODELING CHOICES, LIMITATIONS AND OUTLOOK 
What does my partner think of me? 

It may appear that we made a gross simplification in modeling 
the self using their own representation to estimate how the other 
sees the self A more general formulation might be more con- 
ventional, where the beliefs of the self (self-representation and 
reputation with respect to others) are separate. Yet this is not a 
modeling choice made to make the model simpler. For example, 
clinical psychology indicates that beliefs about the self are highly 
correlated with beliefs about how others see the self. Moreover, 
patients with unwarranted beliefs about themselves and others 
that look "psychologically defensive" show no greater social desir- 
ability than healthy controls (Moutoussis et al, 2013). We suggest 
that the self uses beliefs about their esteem to model the beliefs of 
the other, a generalization of the "sociometer" theory with a view 
to testing the limits of this assumption's predictive power. 

Depth-of-thought 

Our model uses a very simple other, who makes no inferences 
about itself Clearly, this is not a realistic simulation of other. 
Furthermore, our model self does did not explicitly calculate dis- 
tant outcomes before applying the prior "horizon." The latter is 
partly justified as most people look to the future to quite a lim- 
ited extent. In the Trust Task, only about a quarter of Investors 
show up to two levels of recursive interpersonal thought (Xiang et 
al, 2012). Having said this, further work needs to consider agents 
that explicitly simulate outcomes for a small number of steps into 
the future and apply inference and preferences to patterns of such 
outcomes. 

Normative self-representations 

We envisage that self representations would enter into the assess- 
ment of proximal gains in the light of long-term outcomes; for 
example, "What sort of person am I, if I treat the other player 
like this?"; "If that's the sort of person I am, how am I likely 



to be treated in the future?" This extension of the simple model 
above will be crucial if the other makes inferences about the self. 
Our long-term aim, test the hypothesis that the normative role of 
self-representation is to predict the likely outcomes of social inter- 
actions, is likely to require such complex thinking. We envisage 
that beliefs about the opponent can, through conditional depen- 
dencies among Bayesian estimates about me and my opponent, 
affect beliefs about me. This may be crucial for understanding 
psychopathology in interpersonal exchange. 

Model parameterization 

We discussed above that interpersonal, affectively charged repre- 
sentations may be parameterized in a number of related ways. We 
chose a very simple parameterization for the purposes of demon- 
stration. In the light of a wider literature, the validity of different 
models for interpersonal representation and the relationships 
between them remain to be clarified. One important contribu- 
tion of formal models, of the sort we have introduced here, is that 
they can provide quantitative predictions of choice behavior. In 
principle, this means that one can use observed choices to esti- 
mate the parameters of a given model and — more importantly — 
use Bayesian model comparison to adjudicate between different 
forms or hypothetical schemes. 

SUMMARY 

In conclusion, we have sketched an elementary model of self and 
other representation during interpersonal exchange; within which 
these representations have important functional roles. We have 
seen that it is fairly straightforward to place optimal decision 
schemes in an active inference framework. This involves replacing 
optimal policies, defined by utility functions, with prior beliefs 
about outcomes. The advantage of doing this is that one can 
formulate action and perception as jointly minimizing the same 
objective function, which provides an upper bound on surprise 
or (negative log Bayesian) model evidence. This enables optimal 
control to be cast as a pure inference problem, with a clear dis- 
tinction between action and inference about (partially) observed 
outcomes. Using a simple example, we have demonstrated how 
desirable goals can embody and express prosocial preferences as 
well as beliefs about the type of an opponent. Specifically, we have 
shown how these beliefs can be updated during iterated play and 
how they can guide interpersonal choices. Although rudimen- 
tary, these simulations illustrate a formal basis for interpersonal 
inference. 
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